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Abstract. We investigate the number of sets of words that can be formed 
from a finite alphabet, counted by the total length of the words in the set. 
An explicit expression for the counting sequence is derived from the generat- 
ing function, and asymptotics for large alphabet respectively large total word 
length are discussed. Moreover, we derive a Gaussian limit law for the number 
of words in a random finite language. 



1. Introduction and Basic Properties 

Let /„ = fnifn) denote the number of languages (i.e., sets of words) with total 
word length n over an alphabet with m > 2 symbols 1, 1.37]. For instance, /a(2) = 5 
and /a(2) = 16, as seen from the listings 

{a,b},{aa},{ab},{ba},{bb} 

respectively 

{a, aa}, {a, ab}, {a, ba}, {a, bb}, {b, aa}, {b, ab}, {b, ba}, {b, bb}, {aaa}, 

{aab}, {aba}, {abb}, {baa}, {bab}, {bba}, {bbb}. 

Another value is /2(3) = 12, illustrated by 

{aa}, {ab}, {ac}, {ba}, {bb}, {be}, {ca}, {cb}, {cc}, {a, b}, {a, c}, {b, c}. 

The sequence L (2) is number A102866 of Sloane's On-Line Encyclopedia of Inte- 
ger Sequences^ In the present note, we will derive an explicit expression for f n (m) 
(Theorem [1] below), establish asymptotics (Sections HH5): and derive a limit law for 
the number of words in a random finite language (Section [SJ . 
The ordinary generating function (ogf) [TJ 1.37] 

(1) n*):=X>' = e*pfe <-f\-M 

n=0 \fe=l / 

can be obtained by a standard procedure (the "power set construction" [TJ 1.2]; 
finite languages are sets of sequences built from alphabet elements). Its first terms 
are 

(2) F{z) = l + mz + ±m(3m - l)z 2 + m(^m 2 - \m + §) + 0(z 4 ). 
Note that 

F{z) = exp I — 

where 
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is analytic for \z\ < 1/y/m. (Indeed, for < e < 1/^/m, \z\ < 1/^/m — e, and k > 2, 
we have 



m\z\ k < m\z\ 2 < m(m~ 1/2 - e) 2 = 1 - e^/m{2 - Sy/m) =: 1 - e' , 



whence 



rnz 



1 — mz k 



m\z\ k 
— <?' '' 



The dominating singularity of F(z) is thus located at z — 1/m, leading to the 
rough approximation f n (rn) ~ to™. Clearly (consider languages consisting only of 
one word), we have f n (m) > to™ for m,n > 2. We will see in Theorem |3j below 
that the ratio f n {m)/m n is e 2 v^+o(iogn). 

Our first result is an explicit expression for f n (m), which can be obtained 
from ([T|). To state it, we write i h n, if the vector i = (ii,...,i n ) € Z™ rep- 
resents a partition of n, in the sense that i\ + 2%2 + • • ■ + ni n = n. 



Theorem 1. For to > 2 and n > 1, we have 

i hn 



(3) 
where 



fn{m) = E 



ii\ ...i n l 



A j (m):=J2(-l) d - 1 ™ j/d /d, 3 > 1- 



Proof. We expand the Lambert series |3] in the exponent of F(z), using the geo- 
metric series formula: 



F(*)=e*p|X;H>— £ 



fc-i 00 



m j z kj 



, k=l 



(4) 



exp ^E A n (m)z"j 

Vn=l / ' \n=l 



z n + 



The fc-th term here can be expanded as 

/ _ \ k 

k 



J2A n (m)z n = E 



■Z-X H Km — 



zi,. 



(A 1 ^(4 2 ^) ,J ...(4^) 



E 



ii H \-i m — k 

i\-\-2i2-\ \-mijnKm 



k 



/lil 4 im ««l+2i2-| l-min 

/ij . . . A m Z 



+ 0(z m+i ) 



(5) 



E 

n=l 



E 



, i hn 
\iiH Hn=k 



j • • • j "n 



)A\ 1 ...A^ I z n + 0{z m+1 ). 



Now © follows from (gj) and ©. 

2. ASYMPTOTICS FOR LARGE ALPHABET SlZE 



□ 



Next we derive the asymptotics of f n (m) as to, the cardinality of the alphabet, 
tends to infinity. Define n n and \i n = (i n (m) by 



E K nZ n — exp ( - — -j and ^/i„z" = i^(z). 
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Note that n\n n is Sloane's A000262 (several combinatorial interpretations are given 
on that web page), and that n n has the representation 

(6) Kn =T,~i 71' n ^ L 

0\ . . . . L n . 

i hn 

Then wc can write 



f n /m n = [z n ]exp Hz/m) 

— Kn + tn-lMl/ m H 1" K ^n/m n . 

If the dependence of /Lt„ on m is not too strong, the first term on the right-hand 
side should dominate when m — > oo. This is indeed the case: 

Theorem 2. If n > 1 is /iiced and to — > oo, we have 

(7) jn (to) ~ K n TO™. 

Proof. Since, as to — > oo, 

Aj (to) = to j + O (to j/2 ) , j > 1, 

we have 

v4j(m)^ = m ji i (1 + 0(to^' /2 )), 

whence, for ih?i, 

Ai(m) il ...A n (m) t " =to"(1 + 0(to- 1 / 2 )). 
The result thus follows from and ([5]). □ 
Note that k\ = 1, «2 = §, and K3 = ir, in line with @. 

3. ASYMPTOTICS FOR LARGE TOTAL WORD LENGTH 

Theorem 3. For large total word length n, the sequence f n = f n (m) has the 
asymptotics 

8 /„ ~ ^-j=t x , n -> 00. 

More precisely, there is a full asymptotic expansion of the form 
0(1/to) TO™e : 

Proof. The proof of Theorem [3] is similar to the saddle point analysis (TJ Exam- 
ple VIII. 7] of exp(z/(l — z)), the ogf of K n , slightly perturbed by the presence of 
the factor 4>(z). The ogf F(z) is actually Hayman-admissible [U[2], but carrying 
out the saddle point method explicitly gives access to a full asymptotic expansion, 
and will be useful for the refined expansions required in Sections 2] and [5] Let us 
shift the dominating singularity from z = 1/to to z = 1. Then the integrand in 
Cauchy's formula 

/•■=/•■<•«> =£/^ d ~- 

has an approximate saddle point at z = z := 1 — l/y^n. We write z = ze lS , where 
6 = arg(z) is constrained by 

(11) \0\<n- a , |<a<|, 
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so that z lies in a small arc around the saddle point. In this range we have the 
uniform expansion 



(12) 

Furthermore 

hence 
(13) 

1 — z 

From (TU and ([13]) we get 



exp (-(n + 1) log ^1 - -)=J - in8 + 0{n~ a ) 
exp(V?i + \ - in6 + 0(n~ 1/2 )), n oo. 

1 - z = n" 1/2 (l - i6>Vn + 0(n~ Q )), 

/n + idn - n 3/2 6» 2 + 0(n 1/2 ~ a ). 



1 



— 77 — 1 

z exp 



(j^) =exp(-i + 2V^-n 3 / 2 2 ) x (l + 0{n l ' 2 - a )) . 



Since <f>{z/m) is analytic at z = 1, the local expansion of the integrand in (jlOp at 
the saddle point z is 

(14) = J>(l/m) exp(-| + 2^ - n 3 / 2 fl 2 ) x (l + O^-)), 

valid as 7i — > oo, uniformly w.r.t. 8 in the range (|TT|) . Note that 

e-" 3/2e2 df? ~ v^"" 3/4 , 

so that integrating (fT4"f from — to ?i~ Q yields the right-hand side of ©. To 
prove ©, it remains to show that the integral from n~ a to ir grows slower (the 
other half of the tail is handled by symmetry) . There is a C > such that 



(15) 



F(z/m) 



< C|z|- n exp5R 



1 - z 



z < 1. 



If z = ze 10 lies on the integration contour, then the factor \z\ " in (|T5|) is O^^) 
The remaining factor exp3?(l/(l — z)) decreases if \9\ = | arg(z)| increases, hence 



/•7J 

J n~ 



exp5i 



1 - ze 1 ' 



dd < 7r exp 5i 



1 



1 - ze 1 ' 

exp( v / ^-n 3/2 - 2Q + 0(1)). 



(The last line is obtained by recapitulating the derivation of (fl"4"|) . with 8 = n a .) 
Hence 



(16) 



F(z/m) 



'<|0|<7I 



yTl+l 



dz 



< exp(2^ - n 3 ^ 2a + 0(1)) . 



This grows indeed slower than e 2v/ ™/n 3 / 4 , so that the proof of is complete. 

It remains to justify the full expansion ©. First note that it suffices to check 
that the central part J™ _ a d8 of the Cauchy integral has such an expansion, as 
the tail estimate (fT6"|) lies asymptotically below ©• The full expansion of 
proceeds by powers of iiT 1 !" 1 : 



-n-l 



z " 1 = exp 



E 



2(j + l) 
j(j+2) 



J/2 
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Taking more terms in (|13p and in the expansion of the analytic function 0, we 
readily see that the full local expansion of the integrand in ([TU|) is of the form 

F{ *J™ ) = </>(l/m) exp(-| + 2y^~ n^ 2 B 2 ) x [ 1 + £ c kj rr k ^ 

where each term in the sum is o(l). The resulting Gaussian integrals are of the 
kind 

J_ n 3/i- a Vn-V 4 / 
- const x n - 3 ( M + 1 )/ 4 j M even. 
(Those with odd M vanish.) This finishes the proof of ©• □ 

4. Joint Asymptotics 

Note that the limits m — > oo and n — > oo commute in the following sense: Since 
we have K n ~ l/(2 v /e7r)e 2v ™/n 3/4 [1, Prop. 8.4], the right-hand side of J7J has, 
as n —> oo, the same asymptotics as the right-hand side of (JSJ for m —> oo. We 
will now show that letting m and n tend to infinity simultaneously yields the same 
result, regardless of their respective speeds. 

Theorem 4. If both the word length and the alphabet size tend to infinity, we have 

„ . , 1 m"e 2 ^ 

fn(m)~—=x — — , m,n^oo. 

2y>en n d ' 4 

Proof. The result can be obtained by an adaption of the proof of Theorem[31 Again 
we use Cauchy's formula, with the same saddle point contour as before: 

f n (m) = r n \ F(ze w /m)e- [( - n+1)e d9 

271 " J-K 

We will show that 

(ZQ \ 
; mj — > 1, m, n — > oo, uniformly w.r.t. 9 6 [— 7r,7r]. 

Assuming this we are done. Indeed, assertion (|18p shows at the same time the 
validity of the local expansion (Til")) , with 0(l/m) replaced by 1, and the persistence 
of the tail estimate ([TB]) . 
To prove (fi"8|) . notice that 

if / 00 ( 1\k-l ^l-k~k a ki0 \ 



de. 



(19) ^(^-;m)=exp £ 

\fe=2 

We have \m 1 ~ k z k e kw \ < h for m > 2, hence 



(— 1) 1 m k z k e k 



E 



2 



k 1 — m 1 k z k e klS 

k=2 



<^Tm 1 - k z k 

k=2 
oo , 

-5> 1 -'( 1 -75 

fc=2 V V 

(1 - 1/Vn) 



2 



m(l — 1/m + l/(m-y/n)) 
Thus the exponent in (fl~9| is uniformly o(l), which establishes (|T8|) . □ 
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5. The Distribution of the Number of Words 

A natural parameter to consider is the number W n of words in a random finite 
language of total word length n. (The alphabet size m > 2 is fixed throughout this 
section.) The appropriate bivariate ogf, with z marking total word length and u 
marking number of words, is given by 



The expected number of words is then 

(20) E[W n ] = f- 1 [z n ]d u F{z,u)U= 1 . 
Notice that 

(21) d u F(z,u)\ u=1 = F(z)jr ^) k ~ lmz \ 

L — ' 1 — mz K 

k=l 

so that the asymptotic analysis of [z n ]d u F(z, u)\ u —i is an easy extension of the one 
of /„ = [z n ]F(z) in Section |3j Close to the saddle point, the new factor resulting 
from the right-hand side of (|2"Tj) is 

T l- = V5i(l + o(l)). 

Hence [z n ]d u F(z,u)\ u= i ~ y/nf n , so that, by ([2TJ]). the expectation of W n satisfies 

E[W n ] ~ \fn, n^r oo. 

Similarly, one can obtain the asymptotics a(W n ) ~ n 1 / 4 j \[2 for the standard de- 
viation. 

Theorem 5. The number of words W n in a random finite language admits a Gauss- 
ian limit law: 

W n — a n r 

On 

in distribution, where the scaling constants satisfy a n ~ \fn and b n ~ n 1 / 4 /y2. 

Proof. As is well known, combinatorial limit laws can often be obtained by an 
asymptotic analysis of the probability generating function 

(22) E[u M/ "] = f~ 1 [z n ]F(z,u). 

Again, we adapt the proof of Theorem[3l If u ranges in a fixed small neighbourhood 
of u = 1, the expansion (1141) generalizes to the uniform local expansion 



F{Z {™; U) = 0(1/™, u; m) exp(-i u + - u'VWW) x (l + Oin 1 ' 2 ^)) , 



2*1+1 

where 



<p(z, u; m) := exp I 



(— 1) 1 mz k u 
k 1 — mz k 

k=2 , 



Integrating from 9 — —n "ton a , and taking into account (JSJ), we infer that (j2"2")l 
has the uniform asymptotics 

E[n "] ~ exp(h n (u)), n — > oo, 



with 



, , s /— „s /— i , , d>(l/m,u:m) 

h n (u) :=2 V^+^logu + log ^y ' / . 

* <p{±/m;m) 
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Note that, for n — > oo, 



0(1), 
" + 0(1), 
0(1), 



so that the function h n (u) satisfies the conditions of [TJ Theorem 9.13], itself taken 
from 5]. We conclude that 

W n - h' n (l) 

(h> n (i) + h>>(i)y/i 

converges in distribution to a standard normal random variable. □ 
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